Combination of multiple bipartite ranking for multipartite web content quality evaluation

نویسندگان

  • Xiao-Bo Jin
  • Guanggang Geng
  • Minghe Sun
  • Dexian Zhang
چکیده

Web content quality evaluation is crucial to various web content processing applications. Bagging has a powerful classification capacity by combining multiple classifiers. In this study, similar to Bagging, multiple pairwise bipartite ranking learners are combined to solve the multipartite ranking problems for web content quality evaluation. Both encoding and decoding mechanisms are used to combine bipartite rankers to form a multipartite ranker and, ∗Corresponding author. Tel: +86-010-58812272 Email addresses: [email protected] (Xiao-Bo Jin), [email protected] (Guang-Gang Geng), [email protected] (Minghe Sun), [email protected] (Dexian Zhang) Preprint submitted to Neurocomputing August 21, 2014 hence, the multipartite ranker is called MultiRank.ED. Both binary encoding and ternary encoding extend each rank value to an L − 1 dimensional vector for a ranking problem with L different rank values. Predefined weighting and adaptive weighting decoding mechanisms are used to combine the ranking results of bipartite rankers to obtain the final ranking results. In addition, some theoretical analyses of the encoding and the decoding strategies in the MultiRank.ED algorithm are provided. Computational experiments using the DC2010 datasets show that the combination of binary encoding and predefined weighting decoding yields the best performance in all four combinations. Furthermore, this combination performs better than the best winning method of the DC2010 competition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combination of Multiple Bipartite Ranking for Web Content Quality Evaluation

Web content quality estimation is crucial to various web content processing applications. Our previous work applied Bagging + C4.5 to achive the best results on the ECML/PKDD Discovery Challenge 2010, which is the comibination of many point-wise rankinig models. In this paper, we combine multiple pair-wise bipartite ranking learner to solve the multi-partite ranking problems for the web quality...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

A Study on Ranking Method in Retrieving Web Pages Based on Content and Link Analysis: Combination of Fourier Domain Scoring and PageRank Scoring

Ranking module is an important component of search process which sorts through relevant pages. Since collection of Web pages has additional information inherent in the hyperlink structure of the Web, it can be represented as link score and then combined with the usual information retrieval techniques of content score. In this paper we report our studies about ranking score of Web pages combined...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neurocomputing

دوره 149  شماره 

صفحات  -

تاریخ انتشار 2015